{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "AI Explanations on CAIP ",
      "provenance": [],
      "private_outputs": true,
      "collapsed_sections": []
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.4"
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "mHF9VCProKJN"
      },
      "source": [
        "# AI Explanations: Explaining a tabular data model\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "hZzRVxNtH-zG"
      },
      "source": [
        "## Overview\n",
        "\n",
        "In this tutorial we will perform the following steps:\n",
        "\n",
        "1. Build and train a Keras model.\n",
        "1. Export the Keras model as a TF 1 SavedModel and deploy the model on Cloud AI Platform.\n",
        "1. Compute explainations for our model's predictions using Explainable AI on Cloud AI Platform."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "iN69d4D9Flrh"
      },
      "source": [
        "### Dataset\n",
        "\n",
        "The dataset used for this tutorial was created from a BigQuery Public Dataset: [NYC 2018 Yellow Taxi data](https://console.cloud.google.com/bigquery?filter=solution-type:dataset&q=nyc%20taxi&id=e4902dee-0577-42a0-ac7c-436c04ea50b6&subtask=details&subtaskValue=city-of-new-york%2Fnyc-tlc-trips&project=michaelabel-gcp-training&authuser=1&subtaskIndex=3). "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "Su2qu-4CW-YH"
      },
      "source": [
        "### Objective\n",
        "\n",
        "The goal is to train a model using the Keras Sequential API that predicts how much a customer is compelled to pay (fares + tolls) for a taxi ride given the pickup location, dropoff location, the day of the week, and the hour of the day.\n",
        "\n",
        "This tutorial focuses more on deploying the model to AI Explanations than on the design of the model itself. We will be using preprocessed data for this lab. If you wish to know more about the data and how it was preprocessed please see this [notebook](https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/01_bigquery/c_extract_and_benchmark.ipynb).\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "rgLXkyHEvTVD"
      },
      "source": [
        "## Before you begin\n",
        "\n",
        "This notebook was written with running in **Google Colabratory** in mind. The notebook will run on **Cloud AI Platform Notebooks** or your local environment if the proper packages are installed.\n",
        "\n",
        "Make sure you're running this notebook in a **GPU runtime** if you have that option. In Colab, select **Runtime** --> **Change runtime type** and select **GPU** for **Hardward Accelerator**.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "TSy-f05IO4LB"
      },
      "source": [
        "### Authenticate your GCP account\n",
        "\n",
        "**If you are using AI Platform Notebooks**, your environment is already\n",
        "authenticated. You should skip this step.\n",
        "\n",
        "**Be sure to change the `PROJECT_ID` below to your project before running the cell!**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "deletable": true,
        "editable": true,
        "id": "4qxwBA4RM9Lu",
        "colab": {}
      },
      "source": [
        "import os\n",
        "\n",
        "PROJECT_ID = \"michaelabel-gcp-training\" \n",
        "os.environ[\"PROJECT_ID\"] = PROJECT_ID"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "fZQUrHdXNJnk"
      },
      "source": [
        "**If you are using Colab**, run the cell below and follow the instructions\n",
        "when prompted to authenticate your account via oAuth. Ignore the error message related to `tensorflow-serving-api`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "deletable": true,
        "editable": true,
        "id": "W9i6oektpgld",
        "colab": {}
      },
      "source": [
        "import sys\n",
        "import warnings\n",
        "warnings.filterwarnings('ignore')\n",
        "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' \n",
        "# If you are running this notebook in Colab, follow the\n",
        "# instructions to authenticate your GCP account. This provides access to your\n",
        "# Cloud Storage bucket and lets you submit training jobs and prediction\n",
        "# requests.\n",
        "\n",
        "if 'google.colab' in sys.modules:\n",
        "  from google.colab import auth as google_auth\n",
        "  google_auth.authenticate_user()\n",
        "  !pip install witwidget --quiet\n",
        "  !pip install tensorflow==1.15.2 --quiet\n",
        "  !gcloud config set project $PROJECT_ID\n",
        "\n",
        "elif \"DL_PATH\" in os.environ:\n",
        "  !sudo pip install tabulate --quiet\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "tT061irlJwkg"
      },
      "source": [
        "### Create a Cloud Storage bucket\n",
        "\n",
        "**The following steps are required, regardless of your notebook environment.**\n",
        "\n",
        "When you submit a training job using the Cloud SDK, you upload a Python package\n",
        "containing your training code to a Cloud Storage bucket. AI Platform runs\n",
        "the code from this package. In this tutorial, AI Platform also saves the\n",
        "trained model that results from your job in the same bucket. You can then\n",
        "create an AI Platform model version based on this output in order to serve\n",
        "online predictions.\n",
        "\n",
        "**Set the name of your Cloud Storage bucket below. It must be unique across all\n",
        "Cloud Storage buckets.**\n",
        "\n",
        "You may also change the `REGION` variable, which is used for operations\n",
        "throughout the rest of this notebook. Make sure to [choose a region where Cloud\n",
        "AI Platform services are\n",
        "available](https://cloud.google.com/ml-engine/docs/tensorflow/regions). Note that you may\n",
        "not use a Multi-Regional Storage bucket for training with AI Platform."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "deletable": true,
        "editable": true,
        "id": "bTxmbDg1I0x1",
        "colab": {}
      },
      "source": [
        "BUCKET_NAME = \"michaelabel-gcp-training-ml\" \n",
        "REGION = \"us-central1\"\n",
        "\n",
        "os.environ['BUCKET_NAME'] = BUCKET_NAME\n",
        "os.environ['REGION'] = REGION"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "fsmCk2dwJnLZ"
      },
      "source": [
        "Run the following cell to create your Cloud Storage bucket if it does not already exist."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "deletable": true,
        "editable": true,
        "id": "160PRO3aJqLD",
        "colab": {}
      },
      "source": [
        "%%bash\n",
        "exists=$(gcloud storage ls | grep -w gs://${BUCKET_NAME}/)\n",        "\n",
        "if [ -n \"$exists\" ]; then\n",
        "   echo -e \"Bucket gs://${BUCKET_NAME} already exists.\"\n",
        "    \n",
        "else\n",
        "   echo \"Creating a new GCS bucket.\"\n",
        "   gcloud storage buckets create --location=${REGION} gs://${BUCKET_NAME}\n",        "   echo -e \"\\nHere are your current buckets:\"\n",
        "   gcloud storage ls\n",        "fi"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PyxoF-iqqD1t",
        "colab_type": "text"
      },
      "source": [
        "### Import libraries for creating model\n",
        "\n",
        "Import the libraries we'll be using in this tutorial. **This tutorial has been tested with TensorFlow 1.15.2.**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MEDlLSWK15UL",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "%tensorflow_version 1.x\n",
        "import tensorflow as tf \n",
        "import tensorflow.feature_column as fc\n",
        "import pandas as pd\n",
        "import numpy as np \n",
        "import json\n",
        "import time\n",
        "\n",
        "# Should be 1.15.2\n",
        "print(tf.__version__)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "aRVMEU2Qshm4"
      },
      "source": [
        "## Downloading and preprocessing data\n",
        "\n",
        "In this section you'll download the data to train and evaluate your model from a public GCS bucket. The original data has been preprocessed from the public BigQuery dataset linked above.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "v7HLNsvekxvz",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "%%bash\n",
        "# Copy the data to your notebook instance\n",
        "mkdir taxi_preproc\n",
        "gcloud storage cp --recursive gs://cloud-training/bootcamps/serverlessml/taxi_preproc/*_xai.csv ./taxi_preproc\n",        "ls -l taxi_preproc"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "8zr6lj66UlMn"
      },
      "source": [
        "### Read the data with Pandas\n",
        "\n",
        "We'll use Pandas to read the training and validation data into a `DataFrame`. We will only use the first 7 columns of the csv files for our models."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "deletable": true,
        "editable": true,
        "id": "Icz22E69smnD",
        "colab": {}
      },
      "source": [
        "CSV_COLUMNS = ['fare_amount', 'dayofweek', 'hourofday', 'pickuplon',\n",
        "             'pickuplat', 'dropofflon', 'dropofflat']\n",
        "\n",
        "DAYS = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']\n",
        "DTYPES = ['float32', 'str' , 'int32', 'float32' , 'float32' , 'float32' , 'float32' ]\n",
        "\n",
        "def prepare_data(file_path):\n",
        "\n",
        "  df = pd.read_csv(file_path, usecols = range(7), names = CSV_COLUMNS,\n",
        "                   dtype = dict(zip(CSV_COLUMNS, DTYPES)), skiprows=1)\n",
        "  \n",
        "  labels = df['fare_amount'] \n",
        "  df = df.drop(columns=['fare_amount'])\n",
        "\n",
        "  df['dayofweek'] = df['dayofweek'].map(dict(zip(DAYS, range(7)))).astype('float32')\n",
        "\n",
        "  return df, labels\n",
        "\n",
        "train_data, train_labels = prepare_data('./taxi_preproc/train_xai.csv')\n",
        "valid_data, valid_labels = prepare_data('./taxi_preproc/valid_xai.csv')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "vxZryg4xmdy0",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Preview the first 5 rows of training data\n",
        "train_data.head()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kV_NEAQwwH0e",
        "colab_type": "text"
      },
      "source": [
        "## Build, train, and evaluate our model with Keras \n",
        "\n",
        "We'll use `tf.Keras` to build a our ML model that takes our features as input and predicts the fare amount. \n",
        "\n",
        "But first, we will do some feature engineering. We will be utilizing `tf.feature_column` and `tf.keras.layers.Lambda` to implement our feature engineering in the model graph to simplify our `serving_input_fn` later."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "HCQFzd_YdwLX",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create functions to compute engineered features in later Lambda layers\n",
        "def euclidean(params):\n",
        "  lat1, lon1, lat2, lon2 = params\n",
        "  londiff = lon2 - lon1\n",
        "  latdiff = lat2 - lat1\n",
        "  return tf.sqrt(londiff*londiff + latdiff*latdiff)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3kQz8Q0DsBM7",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "NUMERIC_COLS = ['pickuplon', 'pickuplat', 'dropofflon', 'dropofflat', 'hourofday', 'dayofweek']\n",
        "\n",
        "def transform(inputs):\n",
        "\n",
        "  transformed = inputs.copy()\n",
        "\n",
        "  transformed['euclidean'] = tf.keras.layers.Lambda(euclidean, name='euclidean')([\n",
        "              inputs['pickuplat'],\n",
        "              inputs['pickuplon'],\n",
        "              inputs['dropofflat'],\n",
        "              inputs['dropofflon']])\n",
        "  \n",
        "  feat_cols = {colname: fc.numeric_column(colname)\n",
        "           for colname in NUMERIC_COLS}\n",
        "\n",
        "  feat_cols['euclidean'] = fc.numeric_column('euclidean')\n",
        "\n",
        "  print(\"BEFORE TRANSFORMATION\")\n",
        "  print(\"INPUTS:\", inputs.keys())\n",
        "\n",
        "  print(\"AFTER TRANSFORMATION\")\n",
        "  print(\"TRANSFORMED:\", transformed.keys())\n",
        "  print(\"FEATURES\", feat_cols.keys()) \n",
        "\n",
        "  return transformed, feat_cols\n",
        "\n",
        "def build_model():\n",
        "\n",
        "  raw_inputs = {\n",
        "          colname : tf.keras.layers.Input(name=colname, shape=(), dtype='float32')\n",
        "            for colname in NUMERIC_COLS\n",
        "      }\n",
        "  \n",
        "  transformed, feat_cols = transform(raw_inputs)\n",
        "\n",
        "  dense_inputs = tf.keras.layers.DenseFeatures(feat_cols.values(),\n",
        "                                               name = 'dense_input')(transformed)\n",
        "\n",
        "  h1 = tf.keras.layers.Dense(64, activation='relu', name='h1')(dense_inputs)\n",
        "  h2 = tf.keras.layers.Dense(32, activation='relu', name='h2')(h1)\n",
        "  output = tf.keras.layers.Dense(1, activation='linear', name = 'output')(h2)\n",
        "\n",
        "  model = tf.keras.models.Model(raw_inputs, output)\n",
        "\n",
        "  return model\n",
        "\n",
        "model = build_model()\n",
        "model.summary()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UvAcjSUcs_l7",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Compile the model and see a summary\n",
        "optimizer = tf.keras.optimizers.Adam(0.001)\n",
        "\n",
        "model.compile(loss='mean_squared_error', optimizer=optimizer,\n",
        "              metrics = [tf.keras.metrics.RootMeanSquaredError()])\n",
        "\n",
        "tf.keras.utils.plot_model(model, to_file='model_plot.png', show_shapes=True, \n",
        "                          show_layer_names=True, rankdir=\"TB\")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GcOkuHPVwjiM",
        "colab_type": "text"
      },
      "source": [
        "### Create an input data pipeline with tf.data\n",
        "\n",
        "Per best practices, we will use `tf.Data` to create our input data pipeline. Our data is all in an in-memory dataframe, so we will use `tf.data.Dataset.from_tensor_slices` to create our pipeline."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ZUu9wFklwmm6",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def load_dataset(features, labels, mode):\n",
        "\n",
        "  dataset = tf.data.Dataset.from_tensor_slices(({\"dayofweek\" : features[\"dayofweek\"],\n",
        "                                                 \"hourofday\" : features[\"hourofday\"],\n",
        "                                                 \"pickuplat\" : features[\"pickuplat\"],\n",
        "                                                 \"pickuplon\" : features[\"pickuplon\"],\n",
        "                                                 \"dropofflat\" : features[\"dropofflat\"],\n",
        "                                                 \"dropofflon\" : features[\"dropofflon\"]},\n",
        "                                                  labels\n",
        "                                                    ))\n",
        "\n",
        "\n",
        "  if mode == tf.estimator.ModeKeys.TRAIN:\n",
        "    dataset = dataset.repeat().batch(256).shuffle(256*10)\n",
        "  else:\n",
        "    dataset = dataset.batch(256)\n",
        "\n",
        "  return dataset.prefetch(1)\n",
        "\n",
        "\n",
        "train_dataset = load_dataset(train_data, train_labels, tf.estimator.ModeKeys.TRAIN)\n",
        "valid_dataset = load_dataset(valid_data, valid_labels, tf.estimator.ModeKeys.EVAL)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "l98aRzfPwo5e",
        "colab_type": "text"
      },
      "source": [
        "### Train the model\n",
        "\n",
        "Now we train the model. We will specify a number of epochs which to train the model and tell the model how many steps to expect per epoch."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "h1x_8CR0wtRs",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "tf.keras.backend.get_session().run(tf.tables_initializer(name='init_all_tables'))\n",
        "\n",
        "steps_per_epoch = 426433 // 256\n",
        "\n",
        "model.fit(train_dataset, steps_per_epoch=steps_per_epoch, validation_data=valid_dataset, epochs=10)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "bIh6uds2x2tr",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Send test instances to model for prediction\n",
        "predict = model.predict(valid_dataset, steps = 1)\n",
        "predict[:5]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "gAO6-zv6osJ8"
      },
      "source": [
        "## Export the model as a TF 1 SavedModel\n",
        "\n",
        "In order to deploy our model in a format compatible with AI Explanations, we'll follow the steps below to convert our Keras model to a TF Estimator, and then use the `export_saved_model` method to generate the SavedModel and save it in GCS."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "fbvzBm1lji7b",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "## Convert our Keras model to an estimator\n",
        "keras_estimator = tf.keras.estimator.model_to_estimator(keras_model=model, model_dir='export')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "KLM43L2FjmFu",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "print(model.input)\n",
        "\n",
        "# We need this serving input function to export our model in the next cell\n",
        "serving_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(\n",
        "    model.input\n",
        ")\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "JBA8ejrJjnLB",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "export_path = keras_estimator.export_saved_model(\n",
        "  'gs://' + BUCKET_NAME + '/explanations',\n",
        "  serving_input_receiver_fn=serving_fn\n",
        ").decode('utf-8')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-f8elyM8KMNX",
        "colab_type": "text"
      },
      "source": [
        "Use TensorFlow's `saved_model_cli` to inspect the model's SignatureDef. We'll use this information when we deploy our model to AI Explanations in the next section."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "yFg5r-7s1BKr",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "!saved_model_cli show --dir $export_path --all"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "y270ZNinycoy",
        "colab_type": "text"
      },
      "source": [
        "## Deploy the model to AI Explanations\n",
        "\n",
        "In order to deploy the model to Explanations, we need to generate an `explanations_metadata.json` file and upload this to the Cloud Storage bucket with our SavedModel. Then we'll deploy the model using `gcloud`."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cUdUVjjGbvQy",
        "colab_type": "text"
      },
      "source": [
        "### Prepare explanation metadata\n",
        "\n",
        "We need to tell AI Explanations the names of the input and output tensors our model is expecting, which we print below. \n",
        "\n",
        "The value for `input_baselines` tells the explanations service what the baseline input should be for our model. Here we're using the median for all of our input features. That means the baseline prediction for this model will be the fare our model predicts for the median of each feature in our dataset. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UolAW3lcVTGl",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Print the names of our tensors\n",
        "print('Model input tensors: ', model.input)\n",
        "print('Model output tensor: ', model.output.name)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qpZiW9Cq6IY4",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "baselines_med = train_data.median().values.tolist()\n",
        "baselines_mode = train_data.mode().values.tolist()\n",
        "print(baselines_med)\n",
        "print(baselines_mode)\n",
        "\n",
        "explanation_metadata = {\n",
        "    \"inputs\": {\n",
        "      \"dayofweek\": {\n",
        "        \"input_tensor_name\": \"dayofweek:0\",\n",
        "        \"input_baselines\": [baselines_mode[0][0]] # Thursday\n",
        "      },\n",
        "      \"hourofday\": {\n",
        "        \"input_tensor_name\": \"hourofday:0\",\n",
        "        \"input_baselines\": [baselines_mode[0][1]] # 8pm\n",
        "      },\n",
        "      \"dropofflon\": {\n",
        "        \"input_tensor_name\": \"dropofflon:0\",\n",
        "        \"input_baselines\": [baselines_med[4]] \n",
        "      },\n",
        "      \"dropofflat\": {\n",
        "        \"input_tensor_name\": \"dropofflat:0\",\n",
        "        \"input_baselines\": [baselines_med[5]] \n",
        "      },\n",
        "      \"pickuplon\": {\n",
        "        \"input_tensor_name\": \"pickuplon:0\",\n",
        "        \"input_baselines\": [baselines_med[2]] \n",
        "      },\n",
        "      \"pickuplat\": {\n",
        "        \"input_tensor_name\": \"pickuplat:0\",\n",
        "        \"input_baselines\": [baselines_med[3]] \n",
        "      },\n",
        "    },\n",
        "    \"outputs\": {\n",
        "      \"dense\": {\n",
        "        \"output_tensor_name\": \"output/BiasAdd:0\"\n",
        "      }\n",
        "    },\n",
        "  \"framework\": \"tensorflow\"\n",
        "  }\n",
        "\n",
        "print(explanation_metadata)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rT3iG5pDdrHi",
        "colab_type": "text"
      },
      "source": [
        "Since this is a regression model (predicting a numerical value), the baseline prediction will be the same for every example we send to the model. If this were instead a classification model, each class would have a different baseline prediction."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "b6dyTQ1e9Tan",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Write the json to a local file\n",
        "with open('explanation_metadata.json', 'w') as output_file:\n",
        "  json.dump(explanation_metadata, output_file)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "zmVJKgch6PYJ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "!gcloud storage cp explanation_metadata.json $export_path"      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "J6MKKy6Xb2MT",
        "colab_type": "text"
      },
      "source": [
        "### Create the model\n",
        "\n",
        "Now we will create out model on Cloud AI Platform if it does not already exist."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "S2OaOycmb4o0",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "MODEL = 'taxifare_explain'\n",
        "os.environ[\"MODEL\"] = MODEL"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0bwCxEr5b8BP",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "%%bash\n",
        "exists=$(gcloud ai-platform models list | grep ${MODEL})\n",
        "\n",
        "if [ -n \"$exists\" ]; then\n",
        "   echo -e \"Model ${MODEL} already exists.\"\n",
        "    \n",
        "else\n",
        "   echo \"Creating a new model.\"\n",
        "   gcloud ai-platform models create ${MODEL}\n",
        "fi"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qp4qfnZib-zQ",
        "colab_type": "text"
      },
      "source": [
        "### Create the model version \n",
        "\n",
        "Creating the version will take ~5-10 minutes. Note that your first deploy may take longer."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "LQlcQFG_AB4o",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Each time you create a version the name should be unique\n",
        "import datetime\n",
        "now = datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
        "VERSION_IG = 'v_IG_{}'.format(now)\n",
        "VERSION_SHAP = 'v_SHAP_{}'.format(now)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3l5t2o1t7dal",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create the version with gcloud\n",
        "!gcloud beta ai-platform versions create $VERSION_IG \\\n",
        "--model $MODEL \\\n",
        "--origin $export_path \\\n",
        "--runtime-version 1.15 \\\n",
        "--framework TENSORFLOW \\\n",
        "--python-version 3.7 \\\n",
        "--machine-type n1-standard-4 \\\n",
        "--explanation-method 'integrated-gradients' \\\n",
        "--num-integral-steps 25\n",
        "\n",
        "!gcloud beta ai-platform versions create $VERSION_SHAP \\\n",
        "--model $MODEL \\\n",
        "--origin $export_path \\\n",
        "--runtime-version 1.15 \\\n",
        "--framework TENSORFLOW \\\n",
        "--python-version 3.7 \\\n",
        "--machine-type n1-standard-4 \\\n",
        "--explanation-method 'sampled-shapley' \\\n",
        "--num-paths 50"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "eWkkRFhEMbFa",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Make sure the model deployed correctly. State should be `READY` in the following log\n",
        "!gcloud ai-platform versions describe $VERSION_IG --model $MODEL\n",
        "!echo \"---\"\n",
        "!gcloud ai-platform versions describe $VERSION_SHAP --model $MODEL"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "deletable": true,
        "editable": true,
        "id": "JzevJps9IOcU"
      },
      "source": [
        "## Getting predictions and explanations on deployed model\n",
        "\n",
        "Now that your model is deployed, you can use the AI Platform Prediction API to get feature attributions. We'll pass it a single test example here and see which features were most important in the model's prediction. Here we'll use `gcloud` to call our deployed model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CJ-2ErWJDvcg",
        "colab_type": "text"
      },
      "source": [
        "### Format our request for gcloud\n",
        "\n",
        "To use gcloud to make our AI Explanations request, we need to write the JSON to a file. Our example here is for a ride from the Google office in downtown Manhattan to LaGuardia Airport at 5pm on a Tuesday afternoon.\n",
        "\n",
        "Note that we had to write our day of the week at \"3\" instead of \"Tue\" since we encoded the days of the week outside of our model and serving input function."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "D_PR2BcHD40-",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Format data for prediction to our model\n",
        "!rm taxi-data.txt\n",
        "!touch taxi-data.txt\n",
        "prediction_json = {\"dayofweek\": \"3\", \"hourofday\": \"17\", \"pickuplon\": \"-74.0026\", \"pickuplat\": \"40.7410\", \"dropofflat\": \"40.7790\", \"dropofflon\": \"-73.8772\"}\n",
        "with open('taxi-data.txt', 'a') as outfile:\n",
        "  json.dump(prediction_json, outfile)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "KCiaSn28VprW",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Preview the contents of the data file\n",
        "!cat taxi-data.txt"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kw7_f9QVD8Y_",
        "colab_type": "text"
      },
      "source": [
        "### Making the explain request\n",
        "\n",
        "Now we make the explaination requests. We will go ahead and do this here for both integrated gradients and SHAP using the prediction JSON from above."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "LoxmlGtWD62A",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "resp_obj = !gcloud beta ai-platform explain --model $MODEL --version $VERSION_IG --json-instances='taxi-data.txt'\n",
        "response_IG = json.loads(resp_obj.s)\n",
        "resp_obj"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "pZ0NhAg0h5gH",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "resp_obj = !gcloud beta ai-platform explain --model $MODEL --version $VERSION_SHAP --json-instances='taxi-data.txt'\n",
        "response_SHAP = json.loads(resp_obj.s)\n",
        "resp_obj"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0nKR8RelNnkK",
        "colab_type": "text"
      },
      "source": [
        "### Understanding the explanations response\n",
        "\n",
        "First let's just look at the difference between our predictions using our baselines and our predicted taxi fare for the example."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "825KoNgHR-tv",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "explanations_IG = response_IG['explanations'][0]['attributions_by_label'][0]\n",
        "explanations_SHAP = response_SHAP['explanations'][0]['attributions_by_label'][0]\n",
        "\n",
        "predicted = round(explanations_SHAP['example_score'], 2)\n",
        "baseline = round(explanations_SHAP['baseline_score'], 2 )\n",
        "print('Baseline taxi fare: ' + str(baseline) + ' dollars')\n",
        "print('Predicted taxi fare: ' + str(predicted) + ' dollars')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QmObtmXIONDp",
        "colab_type": "text"
      },
      "source": [
        "Next let's look at the feature attributions for this particular example. Positive attribution values mean a particular feature pushed our model prediction up by that amount, and vice versa for negative attribution values. Which features seem like they're the most important...well it seems like the location features are the most important!"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "6HKvAImeM_qi",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from tabulate import tabulate\n",
        "\n",
        "feature_names = valid_data.columns.tolist()\n",
        "attributions_IG = explanations_IG['attributions']\n",
        "attributions_SHAP = explanations_SHAP['attributions']\n",
        "rows = []\n",
        "for feat in feature_names:\n",
        "  rows.append([feat, prediction_json[feat], attributions_IG[feat], attributions_SHAP[feat]])\n",
        "print(tabulate(rows,headers=['Feature name', 'Feature value', 'Attribution value (IG)', 'Attribution value (SHAP)']))"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}
